Achieving High Output Quality under Limited Resources through Structure-based Spilling in XML Streams

نویسندگان

  • Mingzhu Wei
  • Elke A. Rundensteiner
  • Murali Mani
چکیده

Because of high volumes and unpredictable arrival rates, stream processing systems are not always able to keep up with input data resulting in buffer overflow and uncontrolled loss of data. To produce eventually complete results, load spilling, which pushes some fractions of data to disks temporarily, is commonly employed in relational stream engines. In this work, we now introduce “structurebased spilling”, a spilling technique customized for XML streams by considering the partial spillage of possibly complex XML elements. Such structure-based spilling brings new challenges. When a path is spilled, multiple paths may be affected. We analyze possible spilling effects on the query paths and how to execute the “reduced” query to produce partial results. To select the reduced query that maximizes output quality, we develop three optimization strategies, namely, OptR, OptPrune and ToX. We also examine the clean-up stage to guarantee that an entire result set is eventually generated by producing supplementary results. Our experimental study demonstrates that our proposed solutions consistently achieve higher quality results compared to the state-of-the-art techniques.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Continuously Providing Approximate Results under Limited Resources: Load Shedding and Spilling in XML Streams

Because of the high volume and unpredictable arrival rates, stream processing systems may not always be able to keep up with the input data streams, resulting in buffer overflow and uncontrolled loss of data. To continuously supply online results, two alternate solutions to tackle this problem of unpredictable failures of such overloaded systems can be identified. One technique, called load she...

متن کامل

Distributed Resource Allocation for Synchronous Fork and Join Processing Networks (Tech Report)

Many emerging information processing applications require applying various fork and join type operations such as correlation, aggregation, and encoding/decoding to data streams in real-time. Each operation will require one or more simultaneous input data streams and produce one or more output streams, where the processing may shrink or expand the data rates upon completion. Multiple tasks can b...

متن کامل

Distributed Resource Allocation for Synchronous Fork and Join Processing Networks (Technical Report)

Many emerging information processing applications require applying various fork and join type operations such as correlation, aggregation, and encoding/decoding to data streams in real-time. Each operation will require one or more simultaneous input data streams and produce one or more output streams, where the processing may shrink or expand the data rates upon completion. Multiple tasks can b...

متن کامل

Delivering Qos in Xml Data Stream Processing Using Load Shedding

In recent years, we have witnessed the emergence of new types of systems that deal with large volumes of streaming data. Examples include financial data analysis on feeds of stock tickers, sensorbased environmental monitoring, network track monitoring and click stream analysis to push customized advertisements or intrusion detection. Traditional database management systems (DBMS), which are ver...

متن کامل

Mining Data Streams under Dynamicly Changing Resource Constraints

Due to the inherent characteristics of data streams, appropriate mining techniques heavily rely on window-based processing and/or (approximating) data summaries. Because resources such as memory and CPU time for maintaining such summaries are usually limited, the quality of the mining results is affected in different ways. Based on Frequent Itemset Mining and an according Change Detection as se...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • PVLDB

دوره 3  شماره 

صفحات  -

تاریخ انتشار 2010